Reduction of distortion in speech signal time compression systems



Aug. 19, 1969 2 Sheets-Sheet 1 TIME COMPRESSION SYSTEMS Filed March 23, 1966 l fdl A TTORNE V Filed March l23, 1966 Aug. 19, 1969 A. J. PREsn 3,462,555

REDUCTION oF nrsrromrou 1N SPEECH SIGNAL TIME coMPREssIoN sYsTEMs 2 Sheeats-SheefI 2 FIG. 2

TIME C OMPRE 55E D `lGNA L READ OUT mico/a0 y HEAD of H510 United States Patent O U.S. Cl. 179-1555 8 Claims ABSTRACT OF THE DISCLOSURE One attractive way of time-compressing `speech or other message signals is to discard speech segments periodically and join the remaining ones to form a continuous signal. Unfortunately, this form of chopping gives rise to distortion in the audible frequency range because of amplitude and frequency discontinuities at each splice. Commonly, an attempt is made to remove this distortion by filtering the chopped signal to remove components outside of a desired band. Superior temporal processing is achieved, however, by dividing a speech signal into a number of contiguous sub-bands before chopping and -by passing each chopped sub-band signal through a filter identical to that used originally to generate thesub-band signal before recombination. Distortion energy outside of each filter pass-band, including that within audible frequencies passed by other filters, is thus removed.

This invention relates to the production of time compressed or time expanded signals, and in particular to the production of temporally processed speech characterized by a low noise level. It is an object of this invention to reduce substantially the noise level in such processed message signals.

Speech is composed of sequences of sounds separated by silent intervals. Many of the sound variations are redundant in the sense that they are largely repetitions and are of little use to a listener in determining the meaning of the speaker. In theory, some of the periods of a speech wave may be discarded and the remaining ones joined together to compress the time scale of the speech waves, or some of the periods may be repeated and joined together to expand the time scale of the waves. Unfortunately, reconstruction of a chopped7 signal through the joining together of slightly dissimilar segments gives rise to splicing noise which lowers the quality of the processed speech.

Abrupt changes in amplitude of a spliced signal lmay be reduced by establishing the transition between successive periods in one-to-one correspondence with the fundamental period of the speech signal. However, this technique requires a variable chopping rate which, in turn, leads to extremely complex and expensive equipment. Moreover, it does not eliminate noise stemming from abrupt changes in the momentary frequency of joined segments.

Another technique for reducing discontinuity noise involves shifting the entire speech signal in frequency prior to the time scale processing. Processing is carried out at the new, and generally higher frequency. Gross noise resulting from the processing is removed by filtering, and the signal is restored to its original frequency assignment. As a result of the shift in frequency, splicing noise is effectively reduced.

These techniques thus attempt to improve the quality of temporally processed speech by reducing the level of splicing noise evident in the reconstructed speech signal.

This invention, on the other hand, improves the quality of time compressed or time expanded message signals,

e.g., speech signals, by reducing the level of splicing noise in individual, contiguous, frequency subbands which together encompass the entire frequency range of an applied signal. Each subband accommodates no more than one octave. By thus reducing splicing noise associated with individual frequency subbands, the residual noise associated with the reconstructed signal is markedly reduced. Moreover, the usual filtering of the gross signal may take place as desired.

According to the invention, splicing noise is removed from speech signals, or the like, whose time scale is t0 be altered for transmission through a chopping and joining process, by dividing an applied signal into a plurality of contiguous subbands, each with a bandwidth less than one octave wide, prior to time scale processing. Signals in each of the subbands are then processed, preferably in synchronism, to alter their time scales, and each is individually filtered by one of a set of automatically adjusted filters to remove noise present outside of the designated subband limits. After individual filtering, the subband signals are combined to form a representation of the applied signal. If desired, the combined signal may be additionally processed to alter its time or frequency scale by recording or the like.

This invention will be more fully understood from the following detailed description of the operation of preferred embodiments thereof taken together with the following drawings in which:

FIG. 1 is a schematic block diagram of one embodiment of this invention;

FIG. 2 is an illustrative diagram of a signal compressorexpander employed in this invention; and

FIGS. 3A and 3B illustrate typical means for converting frequency compressed or expanded speech to time compressed or expanded speech.

One embodiment of this invention is shown in FIG. l. While the operation of this embodiment will be described in terms of signal time compression, the description is equally applica-ble to signal time expansion.

In FIG. l, signals to be compressed in time are applied to a bank of n parallel, contiguous, bandpass filters 10. Each filter 10 passes a subsignal occupying a selected frequency band, less than one octave of the frequencies of the applied signal, and the n filters together pass n subsignals occupying n contiguous frequency bands selected from the entire audible speech range. The passbands of filters 10 may, depending on design considerations, be unequal. The unique, and possibly unequal filter delay times are compensated for in delay elements 20. Each subsignal is processed in a corresponding one of n signal compressor-expanders 30, e.g., by the omission of selected signal intervals, and the joining together, after time expansion, of the remaining intervals. As a consequence, the bandwidth of the signal is reduced.

A typical compressor-expander is shown schematically in FIG. 2. The Isubsignal to be processed is recorded on tape 306 moving at velocity v relative to recording head 301. After traveling a distance I the recorded subsignal is read off tape 306 by one of a plurality of readout heads 305 placed uniformly around the circumference of rotating disk 304. The spacing of heads 305 is determined by the fraction of the circumference of disk 304 contacted by tape 306. In FIG. 2, one quarter of the circumference is contacted by tape 306 so that four readout heads, 305-1 through 305-4, spaced ninety degrees apart,

are utilized. Tape 306 forms an endless loop and the information placed on the tape by record head 301 is removed by erase head 303.

As disk 304. rotates in a clockwise direction at an angular velocity equivalent to a tangential velocity u at its circumference, movement of heads 305 in the same direction as tape 306 increases the time required to read out each wavelength of each frequency component recorded on tape 306. VA Awavelength is recorded on tape 306 in time r1 equal to Mv. Since the velocity of the tape relative to the readout heads 305 is v-u, the time f2 necessary to readout the same wavelength is (v-u). Since frequency f is equal to the reciprocal of the period lr, the ratio of the frequency f2 of the output subsignal from heads 305 t the frequency f1 of the input subsignal is The velocity u of disk 304 determines the amount of time compression to be achieved. This velocity is automatically controlled by a signal from master control 32 (FIG. l). The desired speech compression ratio is set by turning a dial, not shown, on control 32 which adjusts a voltage divider. The voltage level of the signal from the divider controls motor speed control 31 which in turn controls the velocity of the motors which drive disks 304.

At the start of a given time period At', head 305-1 (FIG. 2) and point 1 on tape 306 are together at point a. At the end of time At', head 30S-1 has moved to point b, as shown, point 1 on tape 306 has moved to point c, as shown, and the information on that portion of tape 306 between points a and b has yet to be read out. However, head 305-1 is about to break contact with the tape, while head 305-2 is about to make contact with a new section of tape at point a. Thus the information on that portion of tape 306 between points a and b at the end of time At is never read out and in effect is discarded. There is, however, no break in the output signal on lead 307 from heads 305 because just as head 30S-1 stops reading out information, head 305-2 begins to read out information. Ideally, the distance ab should correspond approximately to the wavelength of the fundamental frequency so that only complete cycles of redundant information are discarded.

Thus, compressor-expanders 30 discard selected portions of an applied speech signal and expand the remaining portions to occupy the same time interval as the applied signal. However, frequency compression alters the sound of the signal. If sufficient compression is employed, the resulting speech is so altered as to destroy its meaning to a listener. However, in a fashion to be discussed hereinafter, the applied signal frequency scale may be restored for the output signals.

If disk 304 and heads 305 rotate in opposite directions, signal segments are compressed on the time scale and repeated with a consequent increase in signal pitch and signal bandwidth.

The output subsignal from each compressor-expander 30, though continuous, contains discontinuities. These discontinuities arise because the amplitude and frequency of an applied speech signal are in general constantly changing and the discarding or repeating of even a short portion of each subsignal (represented by the section'of tape 306 `between points a and b) may cause sharp changes in amplitude or frequency, or both, at the joints between the remaining subsignal segments. Changes of this sort give rise to noise, termed splicing noise, in the output signal. Discontinuities occur periodically as consecutive readout heads 305 break and make contact with the tape. While the frequency of these discontinuities is constant, the amplitudes of these discontinuities vary randomly and thus the energy spectrum of the resulting noise is continuous.

The division of a speech signal into a large number of subsignals, on the order of 50 more or less, occupying contiguous narrow frequency bands, is based on the recognition that a significant part of the splicing noise generated by compressing or expanding each subsignal can be made to fall outside the frequency band occupied by the subsignal. Ideally, the greater the noise generated by compressing or expending a subsignal, the narrower that subsignals bandwidth to ensure that a large part of the noise associated with that subsignal falls outside the bandwidth of the processed subsignal. In this invention, each subsignal occupies a bandwidth less than one octave.

To remove a significant portion of the splicing noise present in the output subsignal from each one of compressor-expanders 30 (FIG. l), each output subsignal is passed through a bandpass filter 40, similarly proportioned to pass no more than one octave of the range of the applied signal. Each bandpass filter 40 therefore removes noise energy outside its passband, including noise within the frequency bands of the other output subsignals. While the number of filters in set 40 of bandpass filters is equal to the number in set 10, the frequency ranges covered by corresponding filters in each set are in general not equal. The frequency ranges and center frequencies f2 of the filters in set 40 are related to the frequency ranges and center frequencies f1 of the filters in set 10 by Equation l.` Further, because the degree of compression or expansion of the input signal generally varies from one selected input signal to another, bandpass filters 40 have variable center frequencies and passbands.

`In a preferred form of the invention, the center frequencies and passbands of filters 40 are varied automatically by filter controls 41 in response to a signal from master control 32. The desired speech compression ratio is set by turning a dial, not shown, on control 32 which adjusts a voltage divider. The voltage level of the signal from the divider is proportional to the speech compression or expansion desired. The divider voltage level controls, through normally closed switch 34, the operation of filter controls 41. Each filter 40 contains a signal responsive device, such as a voltage controlled capacitor, controlled by a signal from filter control 40. If desired, variable inductors can be used instead of variable capacitors. Switch 33 is normally open thus preventing any variations in the center frequencies of filters 10 from their normal values.

The number of filters in each set of bandpass filters 10 and 40 is determined by the bandwidth of each filter and the bandwidth of applied speech signals. The narrower the bandwidth or each filter in set 40, the slower its response time and the greater its smoothing effect on the discontinuities between subsignal segments. A bandwidth of approximately 60` cycles per second for each filter of set 40 is justified in regions of maximum splicing noise, while in regions of low splicing noise bandwidths of up to several hundred or more cycles per second are appropriate.

Since the delays in the two sets 10 and 40 of parallelconnected bandpass filters are not necessarily equal from band to band, compensating delays 20 ensure that all the subsignals are in time synchronism upon emerging from bandpass filters 40. Unequal filter delays can also be compensated for by varying the distance l, shown in FIG. 2, between recording head 301 and point a on the path of tape 306, as a function of delays of the corresponding subsignal.

Upon emerging from filters 40, the filtered subsignals are summed in network 50. In the compression mode of operation, signals from network 50 are compressed in frequency (frequency or pitch lower and bandwidth reduced) but not in time, that is, output signals occupy the same interval as input signals. Because of the frequency alteration, however, they may not satisfactorily be perceived by a listener. In order that the continuous signal from network 50 reproduce faithfully the original speech frequencies, the signal is accordingly recorded in recorder 60 at one speed over the input speech interval At, and then played back at a higher speed over a lesser time interval Atl. The interval At is selected so that the frequencies fz in the chopped, but continuous, signal are increased to the original speech frequencies f1.

In the expansion mode of operation, signals from network 50 are expanded in frequency (frequency higher and bandwidth greater) but not in time. Recorder-playback system 60 (FIG. 1) may thus be employed to restore the natural frequencies to chopped speech signals by recording compressed signals at one speed and playing recorded signals out at a lower speed. If, however, input speech signals to filters are recordings played at a selected speed higher than the recording speed, the frequency signals from network 50 will be a natural sounding time compressed version of the input. In this case, recorder 60 can be dispensed with, but the frequency range covered by filter set 10 must be increased to accommodate the higher frequency components of the input speech signal. This is done by closing normally open switch 33 while opening normally closed switch 34. The center frequencies and passbands of filters 10 are then appropriately varied in response to a signal from control 32 by the operation of filter controls 11.

A suitable recorder playback system 60 is shown schematically in FIGS. 3A and 3B. In FIG. 3A, for speech compression, speech from network v50 is recorded on tape 600 moving at velocity v2 past recording head 601. Length d of tape 600 is recorded and stored on reel 602 in time period Arg is proportional to l/f2. To read out the recorded signal, tape 600 is rewound on reel 612 and then is pulled off reel 612 past readout head 611 at velocity v2', as shown in FIG. 3B. The same length d of tape 600 is read out in time Atl is proportional to l/fl. Since When speech is to be expanded rather than compressed,

velocity v2 must be less than velocity v2 to reproduce the expanded speech with natural sounding frequencies. A plurality of n recorder systems 60 may be employed with the n signal compressor-expanders 30-1 through 30-n, if desired.

Signal compressor-expanders 30 shown in FIG. 2 may comprise separate channels of a high speed, multiple track recorder. In this case the spacing l of the record heads 301 from point a can vary from track to track to compensate for different delays in the bandpass filters 10 and 40 associated with each speech band.

Other embodiments of this invention will become obvious to those skilled in the speech compression art.

What is claimed is: 1. Apparatus for processing a message signal which comprises:

means for dividing 4an applied message signal into a first set of subsignals occupying a corresponding set of contiguous frequency subbands each of said subbands accommodating less than one octave of the frequency spectrum of said applied message signal;

means for generating from said first set of subsignals a corresponding second set of subsignals altered in frequency bandwidth compared to the frequency bandwidth of said first set;

means for filtering each subsignal of said compressed second set of subsignals to remove noise outside its subband; and

means for combining .all of said filtered subsignals t0 form a representation of said applied signal.

2. Apparatus as in claim 1 wherein said means for generating comprise:

means foi periodically discarding message signals segments of selected duration in each subsignal of said first set of subsignals, and for joining the segments of said message signal that remain to form a continuous signal subsignal in each of said contiguous subbands.

3. Apparatus as in claim 1 wherein said means for generating comprise:

means for periodically repeating message signal segments of selected duration in each subsignal of said first set of subsignals, and for joining the resulting segments to form a continuous signal in each of said contiguous subbands. 4. Apparatus for processing a message signal which comprises:

means for generating a control signal; means for dividing the frequency spectrum of an applied signal into a plurality of contiguous subbands each of which accommodates less than one octave of said applied signal; means responsive to said control signal for temporally processing signals in each of said subbands; means for filtering each of said temporally processed signals to remove noise outside its subband; means for adjusting said filtering means in response to said control signal; and means for combining all of said filtered, processed signals to form a representation of said applied signal. 5. Apparatus as in claim 4 wherein said means for generating comprise:

means responsive to said control signal for periodically discarding message signal segments of selected duration in each of said plurality of contiguous subbands, and for joining the message signal segments that remain to form a continuous signal in each of said contiguous subbands. 6. Apparatus as in claim 4 wherein said means for generating comprise:

means responsive to said control signal for periodically repeating message signal segments of selected duration in each of said plurality of contiguous subbands, and for joining the resulting segments to form a continuous signal in each of said contiguous subbands. 7. Apparatus which comprises: means for producing a control signal; means for dividing an applied signal into a first set of subsignals occupying a large number of contiguous, relatively narrow frequency bands; means responsive to said control signal for generating from said first set of -subsignals .a corresponding second set of subsignals altered in frequency bandwidth compared to the bandwidth of said first set; means for filtering each subsignal in said second set of subsignals to remove noise outside its relatively narrow frequency band; means for lautomatically adjusting the center frequencies and passbands of said filtering means in response to said control signal; means for synchronizing in time each of said filtered second set of subsignals; means for combining said synchronized filtered subsignals to produce an intermediate signal altered in frequency bandwidth compared to the bandwidth of said applied signal; and means for converting said intermediate signal to an output signal containing approximately the same component frequencies as said applied signal. 8. Apparatus as described in claim 7 wherein said converting means comprise:

means for recording said intermediate signal at a first selected speed; and means for reading out said recorded intermediate signal at a second speed selected so that the frequencies of the components of said readout signal are substantially identical to the frequencies of the components of said applied signal.

(References on following page) References Cited UNITED RALPH D. BLAKESLEE, Primary Examiner STATES PATENTS J. A. BRODSKY, Assistant Examiner Carpe 179-1555 Bedford U.S. C1. X.R. Fairbanks et al. 5 179-1, 15, 100.2

Hopner 179-15 X 

